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DNA sequence and expression of the B95-8 
Epstein-Barr virus genome 
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The complete (172,282 base pairs) nucleotide sequence of the B95-8 strain of Epstein-Barr virus has been established 
using the dideoxynucleotide/ M13 sequencing procedure. Many RNA polymerase II promoters have been mapped and the 
mRNAsfrom these promoters have been assigned to the latent or early /late productive virus cycles. Likely protein-coding 
regions have been identified and three of these have been shown to encode a ribonucleotide reductase, a DNA polymerase 
and two surface glycoproteins. 



Epstein-Barr virus (EBV) is a human herpesvirus* which is 
endemic in all human populations; Most people are infected 
with the virus in eai*ly childhood and then carry the virus for 
life. If the initial infection is delayed until adolescence, infec- 
tious mononucleosis frequently results. The virus is also linked 
with certain kinds of cancer. In the malarial belt of Africa, EBV 
is a contributory factor in the development of Burkitt*s lymph- 
oma and in South-East Asia the virus is linked to the 
high incidence of undifferentiated nasopharyngeal carcinomas 
(reviewed in refs 2, 3). 

The lack of a simple permissive tissue system for EBV makes 
it difficult to obtain large amounts of virus and hampers genetic 
analysis. Until recently no genes had been located on the viral 
genome. Only primate B lymphocytes appear to have recieptors 
for the vinis'*~* but in vivo both B lymphocytes and certain 
(possibly epithelial) cells in the oropharynx become infected'*^. 
There are many diflerent EBV-infected lymphoid cell lines and 
these derive either from Burkitt's lymphoma explants or from 
B lymphocytes infected in vitro with EBV. When B lymphocytes 
are infected with EBV they are efficiently immortalized to per- 
petual growth. The B95-8 cell line was established by infecting 
marmoset B lymphocytes with EBV from a human with infec- 
tious mononucleosis^ Only a small proportion of the B95-8 ceils 
support virus production spontaneously; the remainder are con- 
sidered to be latently infected. The proportion of cells producing 
EBV can be substantially increased'^ by adding various inducers 
such as the tumour promoter I2-0-tetradecanoylphorboi-13- 
acetate (TPA). 

EBV has a double-stranded DNA genome about 172 kilobases 
(kb) long which is linear in the virus particle**'*^ but exists as 
a circular episome inside the nucleus of the infected cell*^•*^ 
The circularization is mediated by means of multiple direct 
sequence repeats aboiit 0.5 kb long at the ends of the linear 
form'^''* which become joined in the circular form*^ The 
genome is further divided into a short and a long unique region 



* Present addresses: Memorial Sloan-Kettering Cancer Center, 1275 
York Avenue, New York. New York 10021. USA (R.B.); LSU Medical 
Center, 1901 Perdido St, New Orleans. Louisiana 701 12, USA (P.L.D.); 
Cellular Regulatioii Section, Laboratory of Biochemistry, NCI, National 
Institutes of Health. Bethesda, Maryland 20205, USA (C.S.). 



by direct sequence irepeats (up to 12) of --3 kb'^"". Unlike other 
herpesviruses, the short (Us) and long (aj unique regions of 
EBV are maintained in a unique orientation relative to each 
other. In nriost EBV-containing cell lines (including B95-8) the 
majority of EBV DNA is episomal and it is generally thought 
that most gene expression in such lines is from this form. 

The £caRI and BamHl restriction fragments of the virus 
have been cloned and restriction maps for these enzymes 
obtained''*-^"'^*. By using these cloned fragments as probes in 
Northern blotting experiments, many viral mRNAs have been 
approximately localized on the viral genome^^*"; As a result of 
hybrid-selected translation experiments on mRNA from EBV- 
infected cells, many viral proteins have been assigned to regions 
of the viral genome^"^^*. The genome is thought to be transcribed 
by the cellular RNA polymerases II and III. 

We have now determined the complete DNA sequence of the 
B95-8 strain of EBV and analysed the sequence for likely coding 
regions and transcription promoters, splice junctions and 
poly(A) addition sites. This information is being used to analyse 
the transcription and gene expression of EBV. Here we show 
the overall arrangement of possible protein-coding regions and 
summarize our present knowledge of the transcription and trans- 
lation of the virus. 

DNA sequence analysis 

M!3 subclone libraries were constructed from suitable £coRI 
or Bam HI fragments of B95-8 EBV by the sonication method" 
using the M13mp8 and M13mp9 vectors^^ These M13 clones 
were sequenced randomly by the dideoxynucleotide procedure^^ 
and the data compiled in a DEC VAX computer using the 
programs of Staden^**. About 95% of the seqiience of each region 
was obtained on both strands by the random procedure and the 
final single-stranded areas or ambiguities were resolved by more 
directed methods. The methods used have been reviewed by 
Bankier and Barrel^'. The sequence of each nucleotide was 
determined on average 7.3 times and, apart from small parts of 
certain repetitive regions, determined on both strands. To join 
, up the sequences of adjacent EcoRI or BomHI clones, random 
sequence analysis of a clone overlapping the junction was per- 
formed. Restriction maps deduced from the DNA sequence are 
shown in Fig. 1. 
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Fig. 1 Arrangement of restriction 
sites, major open reading frames, 
promoters and other features in 
the B95-8 EBV genome. Restriction 
maps deduced from the DNA 
sequence are shown for the enzymes 
Hmdili, Sail, EcoRl and BamHl 
beneath the scale in kb. Position 0 
on the scale is the EcoRl site between 
EcoRl Dhet and EcoRl 1; this is 5! 
bases to the right of the true left end 
of the short unique region. Tandemly 
repeated regions of the genome are 
shaded. Reading frames are indi- 
cated by horizontal arrows and 
AATAAA (mRNA 3' ends and 
poly(A) sites) by vertical arrows. 
9 Marks the rarely used ATTAAA 3' 
end signal. Promoters confirmed to 
function In B95-8 cells are shown by 
the symbol T and promoters sur- 
mised from the DNA sequence are 
indicated by the symbol r. The read- 
ing frames shown in the diagram 
were selected on the grounds of 
length, cbdon usage, initiator 
methionine content and position 
relative to each other and to tran- 
scription signals. Confirmed pro- 
moters have been mapped by a com- 
bination of S| mapping and primer 
extension on RNA from B95-8 cells 
and in some cases by in vitro tran- 
scription. Possible promoters have 
been predicted from position and 
sequences homologous to the TATA 
box". The complete DNA sequence 
and feature table have been 
deposited with the EMBL Nucleo- 
tide Sequence Data Library, Post- 
fach 10.2209, 6900 Heidelberg, FRG. 



Tiblc I Coordtnaies of the starts and ends of the reading Trames indicated in Fig. 1 and the 
calculated molecular weights <MW) or the prediaed polypeptides 
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Position 


MW 


Promoter 




Comments 


BNRFI 


1.736 


5.689 


142,843 










BCRFI 


9,675 


10,184 


19,914 










BWRFI 


12,541 


13.689 


39,866 








12 copies 


BVRFI 


48,429 


49,964 


55,173 










BHRFI 


54.376 


54.948 


21,893 










BHLFI 


52,557 


50,578 


66,244 


BII-LI 


52,786 


Early 




BFLF2 


56,935 


55,982 


35,361 










BFLFI 


58,525 


56.951 


57,912 










BFRFI 


58.89 1 


59.898 


37,632 










BFRF2 


59,610 


61,580 


70,942 










BFRF3 


61,456 


62,034 


19.966 










BPLFI 


71.527 


62,081 


337,973 










BOLFI 


75.239 


71,523 


132.750 








; 


BORFI 


75.238 


76J29 


39.191 


BO-RI 


75.052 


Late 




B0RF2 


76.407 


78,884 


93,030 


B0-R2 


76,198 


Early 1 


Ribonucleotide 


BaRFI 


78.900 


79.805 


34,358 


Ba-RI 


78,838 


Early J 


reductase 


BMRFI 


79,899 


81.1 10 


43373 


BM-RI 


79.87 1 


Early 




BMRF2 


81.118 


82,188 


39.515 


BM-R2 


80.81 1 


Ute 




BMLFI 


84.122 


82.746 


51,347 










BSLF2 


84.288 


84.229 


2,) 62 










BSLFI 


86.881 


84,260 


98.040 










BSRFI 


86.924 


87,577 


23.861 


BS-RI 


86 918 


Late 




BLLF2 


88.474 


87,641 


30,952 


BL-L3 


88 481 


Early 




BLRFI 


88,547 


88,852 


10.944 


BL-RI 


88,539 


Late 




nLRF2 


88,925 


89,4 1 0 


1 7.687 


HI -R2 


88 896 


Utte 




HLLFta 


92,153 


89.433 


94.431 


BL'LI 


92.158 


Late 


gp350 


BLLFtb 


92, 1 53 


89,433 


75.171 








gp220 


BLLF3 


90,01 3 


89,569 


' 16.652 


BUL2 


90.020 


Early 




BLRF3 


92.243 


92,599 


12.832 










BERFI 


92.646 


95,162 


92.314 






{ 


Homologous with 


8ERF2a 


95.353 


95,721 


13,(86 






I 


BERF2b, BERF4 


BERF2b 


95,725 


98,244 


92,769 






{ 


Homologous with 


BERFJ 


98.323 


98,766 


16,717 






I 


BERFI, BERF4 


BERF4 


98,805 


101.420 


95.543 






( 


Homologous with 


BZLF2 


102,1 16 


101,448 


25,257 






I 


BERFI, BERF20 


BZLFl 


103,155 


102.556 


21.482 










BRLFI 


105,183 


103.369 


66.594 










BRLF2 


1 04.989 


104.927 


2.343 










BRRFI 


105,182 


106.1 11 


35,319 










BRRF2 


106.302 


107,912 


56,954 










BKRFI 


107.950 


109.872 


56,427 








EBNA 


BKRF2 


109,958 


110.368 


15,080 










BKRF3 


110.275 


111.117 


31.606 










BKRF4 


111.107 


111.784 


24,837 










BBLF4 


114.259 


111.833 


89,853 










BBRFI 


1)4.204 


1 16.042 


68.456 










BBRF2 


115,843 


i 16.781 


34.916 










BBLF3 


i 1 7,386 


116,784 


22,605 










BBLF2 


1 19.080 


117.416 


60364 


















BB-RI 


119,014 


Late 




BBRF3 


1 19.137 


120.351 


45,792 


BB-R3 


1 19,133 


Late 




BBLFI 


120.974 


120,750 


8,470 


BB-LI 


121300 


Late 




BGLF5 


122,341 


1 20,932 


52,666 










BGLF4 


1 23,692 


122.328 


51,291 










BGLF3 


1 24,939 


1 23,944 


37,708 










BGRFI 


(24,938 


125,912 


36,462 










BGLF2 


126,873 


125,866 


36.888 


EE>L8 


1 26,895 


Late 




BGLFI 


128.374 


1 26.854 


54,462 










BDLF4 


1 29.02 1 


128.347 


25,448 










BDRFt 


129.188 


1 30.348 


42.626 










BI)LF3 


131.066 


130.365 


23,791 


EE-L4 


131,073 


Late 


Glycoprotein? 


BULF2 


132.389 


131.130 


46,168 










BDLFl 


133,307 


132.403 


33,624 










BcLFI 


137,466 


133,324 


153,916 


EH-LI 


137,676 


Ute 




BcRF) 


137,862 


139,715 


68.711 










BTRFI 


139,642 


140,916 


46.711 










BXLF2 


143,036 


140,919 


78,321 


ECU 


143,274 


Late 




BXLFI 


144.861 


143.041 


67,193 










BXRFI 


144,860 


145,603 


27.063 










BVRFI 


145.416 


147,125 


62.461 










BVRF2 


147.927 


149.741 


64.102 










BdRFI 


148,707 


149.741 


36.127 


EC-RI 


148,651 


Late 




BILF2 


150.525 


149,782 


27,076 








Glycoprotein? 


BILFI 


153.099 


152,164 


34,519 








Membrane protein? 


BALF5 


156,746 


153,702 


113.419 








DNA polymerase 


BALF4 


159.322 


156.752 


95.640 


EC-Ll 


159337 


Late 


Glycoprotein? 


BALF3 


161.678 


159312 


85.536 










BALF2 


164,770 


16U87 


123.122 










, BALFI 


165,517 


164,858 


25.149 










BARFI 


165.504 


166,166 


24.471 


ED-RI 


165.498 


Earty 




BNLF2b 


I67J03 


167,001 


11.449 










BNLF2a 


167.486 


167.307 


6,540 


ED.L2 


167.495 


Early 




BNLFIc 


168,966 


168.163 


28,851 










BNLFIb 


169,129 


169.043 


3,212 










BNLFIa 


169.474 


169.208 


9.942 


EDLl 


169.517 


Latent 





BLLFla and b are overlapping co-terminal reading frames. BLLFlb having a central portion 
spliced out. The locations of the approximate transcription start points of the confirmed 
promoters and the stage of expression of mRNAs from these promoters are given. Reading 
frames identified as coding for known proteins are indicated. Reading frames rich in the 
glycosylation site sequence (N-X-T/S) are also noted, together with a possible membrane 
protein, which has many hydrophobic amino acid residues. 



The complete DNA sequence, together with a table showing 
the positions of the features shown in Fig. I, has been deposited 
with the EMBL database; for reasons of space the sequence 
and feature table cannot be shown here. We have previously 
published the sequences of the EcoR\ Dhet, £coRI C» 
BamHI B, BamHI L, BamH\ a and part of the Bam HI M 
regions together with some analysis of their transcription^^"^^. 

Most of the sequence we have established is from single EcoRl 
or Bam HI clones. The possibility that cloning artefacts may 
have arisen in the construction and maintenance of the EcoRl 
and BamH\ libraries cannot be excluded, but there is no reason 
to believe that such artefacts have occurred. The overlap between 
the EcoRl Dhet and £co RI I fragments was obtained by 
sequencing the corresponding region of EBV strain, M-ABA, 
which has a single base change in the EcoRl recognition 
sequence. 

There are some well documented areas of the EBV genome 
in which some strains have deletions relative to most other 
strains: one of these is in the BamHI WYH region and another 
is in the £co RI C region. B95-8 does not have a deletion in the 
BamHl WYH region; the P3HR1 and Daudi strains, however, 
are missing bases 45,644 to 52.450 and 45,415 to 52,824 of our 
sequence respectively^''^ There is a deletion of — I3.6kb in 
B95-8 relative to most strains and we have determined that it 
lies between bases 152,012-152,013 by comparison with the 
sequence of the corresponding fragment of Raji DNA^^. 
Similarly, a deletion of 2,658 base pairs (bp) in the £co RI Dhet 
fragment of Raji DNA removes bases 163,978 to 166,635 of the 
B95-8 sequence. 

Repeat sequences 

The genome is divided into the two unique regions ( L^s and Ui) 
by the major internal repeat. We have avoided an earlier nomen- 
clature in which the genome is subdivided into smaller unique 
regions by other repetitive sequences because there are many 
more repeat sequences in the virus than was previously thought 
and these do not appear to separate functional domains of the. 
virus. 

In the EcoRl Dhet clone that we sequenced (derived from 
the circular form of the virus) there were four copies of the 
terminal repeat, three of 538 bp and one of 523 bp". In the 
3.07-kb major internal repeat the sequence of only one of the 
Bam HI W clones was determined and for the moment we have 
assumed that the repeats are identical. The number of copies 
of this repeat (11.6 copies in the B95-8 strain) has been taken 
from previous work*'. Our sequence of BamHl W is identical 
to that determined by Jones and Griffin'*^ but differs in two 
places from the sequence of Cheung and Kieff*^. 

Repetitive regions are scattered throughout the genome. Some 
repeats are found in likely coding regions and it is known in 
some cases that the repeats are actually encoded into protein: 
one of the most striking of these is the BamHl K Epslein-Barr 
nuclear antigen (EBNA) Gly-AIa repeat'** in the BKRFI reading 
frame (BKRFI is explained below). In this repeat and several 
others there is no degeneration (third position variation) in the 
repeat sequences which are coding. Some mechanism or con- 
straint apart from simply coding for protein may prevent these 
repeat sequences from drifting. 

Interpretation of the sequence 

The DNA sequence has been analysed** for transcription pro- 
moters, major open reading frames and possible polyadenyla- 
tion/mRNA processing sites. The reading frames, promoters 
and AATAAA signals are shown in Fig. 1 together with other 
features of the sequence such as repetitive sequences. In our 
nomenclature, reading frames and promoters are preceded by 
the abbreviated name of the restriction fragment in which they 
start translation or transcription. A promoter starting transcrip- 
tion in BamHl K is preceded by BK-, one starting in EcoRI C 
is preceded by EC-. This is followed by L or R depending on 
whether the promoter or reading frame is leftward or rightward 
on the standard map. Promoters are then simply numbered; 
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^eading frames have F and a number. Thus, BN-RI would be 
a rightward promoter starting transcription in Bam HI N and 
BCLF2 would be a leftward reading frame beginning in 
BamHIC. In many cases (see below) we have demonstrated 
promoters to be functional in B95-8 cells but some promoters 
are still only predicted from the DNA sequence. 

On grounds of size alone it is likely that all or large parts of 
the reading frames shown are expressed as protein. Their posi- 
tions with respect to each other and to promoter and polyadenyl- 
ation/RNA processing sites strengthen this argument. Because 
of the high G +C content (59.94% in B95-8) we have been able 
to use codon usage analysis to further justify many of the reading 
frames". There are 84 unique major open reading frames shown 
in Fig. I and the coordinates and sizes of the reading frames 
and locations of known transcription starts and various other 
features are shown in Table I. The sequence is numbered from 
the EcoRI site separating EcoRl Dhet from EcoRI I. 

Generally the arrangement of coding regions in the unique 
regions is economical, particularly in the large unique region. 
Often there are few or no bases between reading frames and 
some promoters are found to lie in the coding regions of the 
adjacent genes. This economical arrangement in the unique 
regions contrasts with the large repetitive regions of the virus. 
The number of major internal repeat units is variable in different 
strains*^''^-'***'*' and tends to compensate for the deletions that 
certain strains have; this may indicate a packing constraint on 
the size of the virus. 

Gene expression 

The total number of proteins expressed from the EBV genome 
is unknown. Of the 84 major open reading frames, some may 
be spliced together, tending to reduce the total number of 
proteins expressed by the virus, while alternate splicing patterns 
would increase the number of proteins. Some proteins (discussed 
below) are thought to correspond to the major antigens of the 
virus, EBNA, MA (membrane antigen) and the capsid proteins 
(VGA). Other classically identified antigens such as the EA 
(early antigen) group and LYDMA (lymphocyte-detected mem- 
brane antigen), remain to be identified as polypeptide chains. 
A large number of EBV-specific polypeptides have been mapped 
to particular regions of the virus by hybrid-selected transla- 
tion^*"^*, though in some regions of the virus no proteins have 
yet been detected by this method. For example, the two very 
large leftward reading frames BPLFl and BOLFI would encode 
polypeptides of molecular weight 338,000 and 133,000. respec- 
tively. However, no proteins have been reported as originating 
from this substantial region of the virus, though the fact that 
the reading frames are so large makes it fairly evident that they 
are expressed. 

The EBV genome is thought to be transcribed by the cellular 
RNA polymerases II and III. RNA polymerase III transcribes 
the two Epstein-Barr early region (EBER) RNA genes in the 
small unique region'*®'*^ There is no evidence of any tRNA 
genes in EBV and our computer searches of the sequence have 
revealed none. The function of the EBER RNAs is not directly 
known but they will substitute for the adenovirus VA RNAs in 
an adenovirus infection^. The EBER RNA genes are localized 
next to the maintenance origin of replication of EBV mapped 
by Yates et aL^\ 

We have searched for RNA polymerase II (pol II) promoters 
by a combination of identifying 5' ends of EBV mRNAs using 
S, mapping and primer extension experiments and by in vitro 
transcription of EBV DNA fragments in a HeLa whole-cell 
extract* . So far, 24 pol II promoters have been detected but the 
final total is expected to be more than twice this number. The 
location of the transcription starts of these promoters (which 
are all confirmed to function in B95-8 cells) are given in Table 
I. All the promoters detected so far have sequences about 30 
bases upstream of their transcription starts that are homologous 
with the TATAAAA sequence". 

RNAs have been classified as expressed in the latent cycle or 
eariy or late productive cycles"-^. We have compared the levels 



of RNAs in B95-8 cells either not treated or treated with TPA.. 
Although the control B95-8 ceils produce virus, the levels of 
productive cycle RNAs are low in control cultures and increase 
very dramatically on TPA treatment so, in practice, it is easy to 
distinguish latent cycle RNAs from productive cycle ones in 
this system. Phosphonoacetic acid (PAA) is used as an inhibitor 
of viral DNA synthesis^ to distinguish between eariy productive 
cycle RNAs and late productive cycle ones. PAA prevents the 
TPA induction of late RNAs but not of eariy RNAs. 

It is unknown whether the regulated expression of EBV genes 
is controlled at the transcriptional level or through RNA process- 
ing or stability; there is a precedent for transcriptional regulation 
in herpes simplex virus (HSV)**. We previously noted 
homologous DNA sequences upstream of some EBV pro- 
moters^'^^ which might be used to regulate those genes at the 
transcriptional level. All the promoters which had those sequen- 
ces give rise to late RNAs but only a few of the late promoters 
have the sequences, so it remains to be seen whether they are 
functionally important. The promoters are classified as latent, 
eariy or late in Table I though at present this classification refers 
to when the corresponding RNA is observed, rather than proven 
regulation of the promoter activity. 

Genes active in latently infected cells 

Significant levels of mRNA are expressed from three regions of 
the genome in latently infected cells in addition to the pol III 
EBER RNA transcripts"'^**'": one of these regions is the BamHI 
WYH region where mRNA is latently transcribed rightward 
possibly from the in vitro promoter^^ in Bam HI W (transcription 
start at position 45,104) or from the small unique region. A 
mature latent mRNA of 3.0 kb containing exons in Bam W, Y 
and H has been described^®. V/e have confirmed that there is a 
strong in vitro promoter starting transcription at position 45,104 
but have been unable to show that it operates in B95-8 cells. 
Our preliminary Northern blotting experiments in the 
Bam W, Y, H region reveal two spliced rightward latent RNAs 
of 2.4 and 3.7 kb. The deletion in P3HRI which may account 
for the nontransforming phenotype of this strain affects the 
region where these mRNAs map as well as the eariy RNA 
transcribed across the I25-bp repeats, encoding BHLFI^'. 

A second latently transcribed region is BamHl K. A 3.7-kb 
latent RNA hybridizes with BamHl K which contains a simple 
repeat sequence 708 bases long, consisting only of the triplets 
GGA, GGG and GCA^'. This repeat sequence lies in the BKRFl 
reading frame and codes for an amino acid sequence consisting 
of only Gly and Ala. This region has been conclusively shown 
to code for the nuclear antigen EBNA-1 (refs 44,60). The 
molecular weight of the BKRFl reading frame is 56,427 which 
is lower than the observed 68,000-85,000 of the EBNA- 1 protein. 
The molecular weight of EBNA varies in a strain-specific manner 
according to the number of repeats in the BKRFl frame®'. The 
location of the promoter for latent transcription of the EBNA-1 
gene is not yet known but may lie several kilobases upstream 
of the start of the BKRFl frame^'. The observation^ that a 
protein very similar to EBNA is produced in response to trans- 
fection of a BamHI-Hindlll fragment (positions 107,565- 
110,491) implies that BKRFl accounts for most of the coding 
sequence. Another EBNA (EBNA-2) has been identified, though 
the gene for this has not yet been accurately localized on the 
genome map^. A third nuclear antigen appears to map to the 
BamHl M region". 

The most abundantly transcribed EBV mRNA in latently 
infected cells is a 2.8-kb RNA encoded by EcoRI Dhet^^'^'^' 
which has been correlated with a latent active leftward pr - 
moter*^ at position 169,513 (ED-LI). In the latent virus cycle 
the RNA from this promoter is spliced and most of the mRNA 
is composed of an exon containing the BNLFlc reading frame 
and terminating at the poly(A) addition site at 166,950. This 
mRNA probably codes for a 42,000-molecular weight membrane 
protein" The 5' end of this mRNA may be different in the 
productively infected cell. 
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Early productive cycle genes 

Productive cycle RNAs are induced in B95-8 cells by treatment 
with TPA and induction of early RNAs is not inhibited by 
blocking viral DNA synthesis with PAA. The functions of three 
reading frames in B95-8 EBV have been identified by comparing 
the protein-coding sequences of all the reading frames in EBV 
with a library of known protein sequences. RNAs encoding 
these reading frames are expressed in the early productive cycle. 
We found that the r eading frame BALF5 is similar to the HS V 
DNA polymerase (J. Quinn and D. McUeoch, personal com- 
munication) and the reading fra mes BQ^^ ^^^"* ^ BaRFl a re 
similar (i 12 amino acidrouToTSOl) t o the'ftSV ribonucleoti de 
redu ctase gene reg ion^^. The EBV reading frame BXLFl has a 
small region of homology (19 amino acids out of 35) with the 
HSV2 thymidine kinase gene". 1 his sequence, however, may 
represent a nucleotide-binding site^ rather than necessarily 
imply a thymidine kinase function for BXLFl. 

Late productive cycle genes 

The major component of the virus capsid is a 160,000 (I60K)- 
molecular weight protein and a similar-sized protein has been 
observed by hybrid-selected translation using the EcoRI E frag- 
ments^. This protein may be encoded by the BcLFl reading 
frame which would give a 1 54K protein. 

The membrane antigen (MA) of EBV contains several proteins 
including gp350/300. gp250/200, pl40 and gp85: three of these 
are glycoproteins (gp). Antibodies to MA and to the gp350/300 
neutralize viral infectivity in tissue culture*^'**. The sequences 
of these proteins are of interest, therefore, in the development 
of synthetic vaccines against EBV infection. Hummel ei aL 
have mapped gp350/3OO and gp250/200 to the BamHl L region 
by hybrid-select translation and suggested that they may be 
expressed from overlapping reading frames because they have 
peptides in common. By Northern blotting and S, mapping , 
it has been shown that there are two co-terminal late RNAs 
containing the BLLFl reading frame, the smaller having most 
or all of an internal repetitive region removed by splicing^^. It 
is proposed that gp350/300 is expressed from a 2.8-kb mRNA 
and gp250/200 is expressed from a 2.2-kb mRNA which is 
spliced, removing the repetitive region^^. 

Although the location of the gene for the gp85 protein is 
unknown, two obvious candidates are the BALF4 and the 
BDLF3 reading frames as they are about the right size and 
contain many potential glycosylation sites. The non-glycosyiated 
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membrane protein pl40 is 143K in size and a similar-sized viral 
protein was mapped to the short unique region^^; this protein 
is probably encoded by BNRFl. 

C nclusioiis 

Our structural approach to the biology of EBV. making use of 
the complete DNA sequence, has been particularly useful 
because of the lack of viral genetics and because of technical 
obstacles to working with the virus in tissue culture. The library 
of over 6,000 characterized MI3 clones covering the genome 
obtained from the sequencing programme is proving most useful 
in analysing the gene expression of the virus. By using the M 13 
clones as probes for S| mapping and Northern blotting experi- 
ments, we are now constructing a detailed transcription map of 
the virus. 

One of the most interesting features of EBV is its ability to 
immortalize B lymphocytes. Only a few regions of the genome 
are expressed in the latently infected lymphocyte: some of these 
may be involved in maintenance of the viral DNA and one at 
least is presumably involved in the immortalization. Identifying 
the immortalizing function may be useful both technically for 
making human monoclonal antibody lines that do not secrete 
EBV and with respect to the involvement of the virus in 
oncogenesis. EBV does not seem to be the proximal cause of 
Burkitt's lymphoma (reviewed in ref. 68) but is nevertheless a 
contributory risk factor. The fact that virtually every South-East 
Asian undifferentiated nasopharyngeal carcinoma carries EBV 
DNA implies a link between the virus and this disease. A region 
of the EBV genome which immortalizes epithelial cells has been 
identified*' but immortalization of lymphocytes may require 
other EBV genes. At present it seems that an understanding of 
the role of the virus in these diseases will probably derive from 
a future investigation of the detailed molecular biology of 
EBV. 
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Epstein-Barr virus (EBV) utilizes a completely different mode of DNA replication during the lytic q^cle than 
that employed during latency. The latency origin of replication, ori-P, which functions in the replication of the 
latent episomal form of the EBV genome, requires only a single virally encoded protein, EBNA-1, for its 
activity. During the lytic cycle, a separate origin, ori-Lyt, is utilized. Relatively little is known about the 
rra/i5-acting proteins involved in ori-Lyt replication. We established a cotransfection-replication assay to 
identify EBV genes whose products are required for replication of ori-Lyt. In this assay, a BamHhH plasmid 
containing ori-Lyt was replicated in Vero cells cotransfected with the BamHl-H target, the three EBV 
lytic-cycle transactivators Zta, Rta, and Mta, and the EBV genome provided in the form of a set of six 
overlapping cosmid clones. By removing individual cosmids from the cotransfection mixture, we found that 
only three of the six cosmids were necessary for ori-Lyt replication. Subcloning of the essential cosmids led to 
the identification of six EBV genes that encode replication proteins. These genes and their functions (either 
known or predicted on the basis of sequence comparison with herpes simplex virus) are BALF5, the DNA 
polymerase ; BALF2 , the single-stranded DNA-binding pj ToteJn^Jiflinolog; BMRFl , the DNA polymera se 
processivity facgrTB^LFl a nd^BBLF4. the primase and hellcase homoloes ; and BBLFZ/3, a potential homolog 
oT the third component onfielLefi cflse-p rimase complex. In addition, ori-Lyt replication in this cotransfection 
assay was also dependent on one or more genes provided by the EBV So/I-F fragment and on the three 
lytic-cycle transactivators Zta, Rta, and Mta. 



Epstein-Barr virus (EBV), like all herpesviruses, has both 
a latent state and a lytic replicative cycle. In latently infected 
B cells, multiple copies of the viral genome are maintained 
predominantly as nucleosome-covered episomes that are 
replicated in synchrony with cell division (73). Latency 
replication proceeds from ori-P, which is composed of two 
domains (72). Region I, the family of repeats, contains 20 
tandemly repeated binding sites for EBNA-1 (57) and func- 
tions as an EBNA-l-dependent enhancer whose activity is 
important for both ori-P replication and transcriptional acti- 
vation of ihcBamUl-C latency promoter (58, 63, 71). Region 
11, the dyad synmietiy, contains two pairs of overlapping 
EBNA-l-binding sites (57) and is the site of initiation of 
latency replication (26). EBNA-1 is the only virally encoded 
protein required for replication of the episomal EBV ge- 
nome, all other proteins, including the DNA polymerase, 
being provided by the host cell (74). 

Lytic EBV replication occurs in mucosal epithelial cells of 
the oropharynx and genital tract (60) and can be activated in 
latently infected B cells in culture by treatment with phorbol 
esters (77), by superinfection with the P3HR-1 strain which 
carries defective rearranged viral genomes (53), or by intro- 
duction of the EBV Zta transactivator (13). Because of the 
limitations imposed by the lack of an EBV-infectable epithe- 
lial culture system, information on EBV lytic viral replica- 
tion has been obtained predominantly in B-cell cultures 
undergoing reactivation. In this system, the transition from 
latency to a lytic replicative cycle is mediated by three viral 
regulatory proteins, the Zta (BZLFl, EBl, or ZEBRA) and 
Rta (BRLFl) transcriptional transactivators (9, 12, 13, 31, 
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34, 48) and Mta (BMLFl), which has a posttranscriptional 
mechanism of action (5, 39, 49). The concerted action of 
these three proteins results in activation of the complete 
cascade of early and late EBV gene expression. 

Lytic DNA replication proceeds from a separate origin, 
ori-Lyt, and results in 100- to 1,000-fold amplification of the 
genome via concatemeric intermediates (18, 32, 59). In the 
prototype EBV genome, there are two copies of ori-Lyt, one 
in DS-L and one in DS-R. However, one copy is sufficient 
for lytic-cycle replication as exemplified by the EBV strain 
B95-8, which contains only DS-L. ori-Lyt covers 690 bp and 
can be divided into three essential domains, (i) The first 
domain is the promoter and leader of the BHLFl gene whose 
transcript is the most abundantly synthesized of the lytic- 
cycle mRNAs (38). (In the DS-R origin, this promoter 
controls the related Pj/I repeat gene.) The BHLFl promoter 
contains four binding sites for the Zta transactivator and is 
strongly Zta responsive in transient expression assays (47, 
48). (ii) The second domain is a central 225-bp region whose 
prominent features include two related AT-rich palindromes 
of 18 and 20 bp and an adjacent polypurine-polypyrimidine 
tract. Elements of this type destabilize helical structure and 
may serve as sites for the initiation or transmission of 
localized unwinding in origins of replication (42, 68). (iii) The 
third domain is a powerful enhancer element that responds 
to the Rta transactivator and contains two binding sites for 
Rta and one for Zta (14, 31, 48). Hammerschmidt and 
Sugden (32) found that origin function was retained when 
this enhancer was replaced with the enhancer from the 
human cytomegalovirus (HCMV) major immediate-early 
gene, suggesting that it was enhancer function per se that 
was provided by this region rather than a contribution 
involving specific protein interactions^"^^^^™**"** 
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The lytic origins of replication have been identified for a 
number of herpesviruses, including HCMV and simian cy- 
tomegalovirus (SCMV) (2, 33), varicella-zoster virus (VZV) 
(17), pseudorabies vims (43), equine herpesvirus 1 (4), 
Marek's disease virus (7), and herpes simplex virus (HSV) 
(51, 62, 67). Of these herpesviruses, replication of HSV has 
been the most extensively characterized. The HSV viral 
proteins involved in DNA replication were originally recog- 
nized through genetic studies (reviewed in reference 66). 
Subsequently, a complete set of seven essential genes was 
identified by Challberg et al. (8, 52, 70) using transient 
cotransfection replication assays. 

We established a transient replication assay in Vero cells 
to determine exactly which EBV proteins are required to 
replicate an ori-Lyt-containing target. Utilizing this system, 
we identified six essential EBV replication genes. In addition 
to these genes, ori-Lyt replication in the transient cotrans- 
fection assay was also dependent on one or more genes 
provided by EBV Sall-F and on the Zta, Rta, and Mta 
lytic-cycle transactivators. 

MATERIALS AND METHODS 

Cells and DNA transfections. Vero cells were maintained in 
Dulbecco modified Eagle medium plus 10% fetal calf serum. 
One day before transfection, 10^ cells were plated in 100-mm 
dishes. Four hours before transfection, the medium was 
replaced with 10 ml of Dulbecco modified Eagle medium 
containing 10% fetal calf serum and antibiotics. DNA was 
transfected by using the calcium phosphate procedure orig- 
inally described by Graham and van der Eb (30) as modified 
by Chen and Okayama (11). DNA (12.5 to 14.5 p-g) was 
diluted with water to a total volume of 450 jxl. To this was 
added 50 \i\ of 2.5 M CaClj and 500 jil of 2x BES [NJ^'- 
bis(2-hydroxyethyl)-2-aminoethanesulfonic acidj-buffered 
saline (50 mM BES [pH 6.95], 280 mM NaQ, 1.5 mM 
Na2HP04). This cocktail was incubated at room temperature 
for 20 min and then added dropwise to the cells. After 
incubation for 20 h at 35*'C in 3.5% CO2, the medium was 
removed, the cells were washed once with phosphate-buff- 
ered saline (PBS), and fresh medium containing antibiotics 
was added. The cells were harvested after a further 72-h 
incubation. 

DNA replication assay. Cell pellets were resuspended in 
100 \i\ of PBS and then lysed in 2 ml of buffer containing 10 
mM Tris-a (pH 8.0), 10 mM EDTA, 2% sodium dodecyl 
sulfate (SDS), and 100 jig of proteinase K per ml (8). After 
overnight incubation at 37*C, the samples were diluted to 4 
ml with water. Sodium acetate (pH 5.2) was added to a final 
concentration of 0.3 M, and the samples were extracted with 
phenol-chloroform and chloroform and then subjected to 
ethanol precipitation. The DNA pellets were resuspended in 
450 \l\ of water, treated with RNase, ethanol precipitated a 
second time, and finally resuspended in 200 \jA of water. 
DNA (5.0 M-g) was cut with 10 U of BamHl or 10 U of BamHI 
and 12 U of Dpnl (Boehringer Mannheim) in a reaction 
volume of 100 jil overnight at 37'*C. To check the Dpnl 
activity, we incubated 4 \l\ of the Dpnl reaction digest with 
500 ng of pUC19 DNA overnight at 37°C. Complete cutting 
of the pUC19 DNA was taken to indicate that the experi- 
mental DNA was also completely digested. The cellular 
DNA was then separated by electrophoresis on a 1% gel and 
transferred to a NYTRAN membrane following the method 
of Southern (61) as modified by the manufacturer,(Schleicher 
& Schuell, Inc., Keene, N.H.). After the transfer, the 
NYTRAN membrane was neutralized in 2x SSC (0.3 M 



NaCl, 0.03 M sodium citrate) and baked under vacuum at 
80°C for 1 h. The NYTRAN membrane was prehybridized 
for 2 h in buffer containing 1% SDS, 5 mg of nonfat dried 
milk per ml, 0.5 mg of heparin per ml, 0.2 mg of sonicated, 
denatured salmon sperm DNA per ml, 60 mg of polyethylene 
glycol 8000, 5x SSPE (750 mM NaCI, 50 mM NaH2P04, 5 
mM NazEDTA), and 10% formamide (75). The membrane 
was then incubated overnight at 60°C with 2 x 10* to 5 x 10** 
cpm of EBV B95-8 BamHI-H, M-ABA BamUl-H (pPDL7), 
or pBR322 DNA probe per ml and radiolabeled by random 
priming (25) to a specific activity of 5 x 10® cpm/jtg, after 
which the filters were washed twice in O.lx SSC-0.1% SDS 
at 65°C for 45 min and exposed to X-ray film for 24 h at 
-80°C with an intensifying screen. 

Plasmid constructions. The target BamlU-H plasmid 
(pSL77) used in the transient replication assays has been 
described previously (49), as have the effector DNA plas- 
mids, pPL17 (Zta), pMH48 (Rta), and pTS6 (Mta) (12, 34, 
47). The variant ori-Lyt target, pPL2A, contains th&Bglll-C 
fragment from M-ABA (pM-B2-C [56]) in which the BHLFl 
open reading frame has been disrupted by deletion of the 
Notl repeats. To express the B95-8 BMRFl gene, d 3,026-bp 
BcU'EcoRl fragment from BamHl-M was ligated into a 
BgHl'EcoRl-clczvcd SV2neo derivative, pGH52, which con- 
tains a ///>idni-£rg/II-////idni linker inserted at the Hindlll 
site of SV2neo. A BSLFl expression vector, pDH131, was 
generated by first ligating a 2,819-bp BglH-BamHl fragment 
containing BSLFl into the BamHl site of pUC18 and then 
transferring the open reading frame as a //i>idIII-Ba/nHI 
fragment into i/mdIII-5amHI-cleaved pSV2neo. To express 
the EBV M-ABA strain BBLF4 gene, 10-mer //wdlll linkers 
were ligated onto a 3,003-bp ^^mI fragment containing the 
entire BBLF4 open reading frame from the cosmid cM301-99 
(3, 56), and this fragment was then cloned into the Hindlll 
site of the pBR322 derivative, pGH59. This plasmid was cut 
with Ryrll, the overhang was filled in with the Klenow DNA 
polymerase, and the DNA was recut with EcoRL The 
resulting fragment containing BBLF4 was then ligated into 
the 5maI-£:coRI-cleaved SCMV Colburn IE94 expression 
plasmid, pGH70, to generate pEF54A. To express the 
M-ABA strain BBLF2 and BBLF3 open reading frames, we 
inserted the 3,052-bpy45p718-5a/I fragment from the cosmid 
CM301-99 into ^^p718-5a/I-cleaved pUC19, generating 
pEF58. This plasmid was cut to completion with Sail and 
partially digested with BgHh and the resulting fragment was 
cloned into the BflmHI-5a/I-cleaved SCMV Colburn IE94 
expression plasmid pGH177, generating EF59A. 

To express the M-ABA strain BALF2 open reading frame, 
we moved a 3,912-bp BglU-EcoRl fragment that lacked the 
5' 1,005 bp of the BALF2 open reading frame from the 
cosmid CM966-20 (56) into the £rg/II.£coRI-cleaved SCMV 
Colburn IE94 expression vector pGH179 to generate pEF55. 
The 5' end of BALF2 was amplified by a polymerase chain 
reaction utilizing the following primers: at the 5' end, 5'- 
CTAGGGATCCATGCAGGGTGCACAGACT-3', and at the 
internal Bglll site, 5'-GCAAAGATCTGCGTGGACAC-3'. 
The pol3anerase chain reaction mbctures contained 10 yA of 
10 X reaction buffer (Cetus), 10.0 jil of deoxynucleoside 
triphosphates (1.25 \jM each dATP, dCTP, dGTP, and TTP), 
5 \l\ of each primer at 20 jtM, 10 ng of the appropriate 
plasmid DNA, and 0.5 \l\ of Tag polymerase (Cetus). The 
reaction mbctures were incubated in a thermocycler for 2 min 
at 94X, 2 min at 52*C, and 3 min at 72**C for 40 cycles. 
Aliquots of the reaction mixtures were analyzed on an 
agarose gel for the presence of the desired 1,005-bp frag- 
ment, after which the DNA was digested with BamHl and 
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FIG 1 Diagrammatic representation of the EBV genome showing the map locations of the genes of interest and the structure of on-Lyt. 
(A) Six overlapping cosmids from the EBV strain M-ABA that were utilized to provide the entire EBV genome. (B) EBV genome. The 
locations of the 5flmHI-W internal repeat (striped rectangle) and the origins of replication. ori-P (triangle) and on-Lyt (circles), as wen as the 
locations of the three lytic-cycle transactivators, Zta, Rta, and Mta (open rectangles), are shown. (C) 6-kb BamHI-H fragment from EBV that 
served as the target in the replication assays. BamHl H contains 3' sequences of the EBNA-2 gene, the entire BHLFl open reading frame 
ori-Lyt and 5' sequences of the BHRFI gene. (D) Minimal ori-Lyt, which is contained within a 690-bp SstlhNstl fragment, is composed of 
three domains. Domain A contains the promoter and leader region of the BHLFl gene and four Zta-binding sites (ZREs, shaded circles) 
Domain B contains two AT-rich palindromes, one 18 bp and the other 20 bp, designated by the double-headed arrow. Domam C is composed 
of an enhancer element, which contains one ZRE (shaded circle) and two binding sites for the Rta transactivator (shaded triangles). The 
central IQ?n\ fragment is not required for ori-Lyt function (32). 



BgiU and cloned into pEF55 at the Bglll site. The correct 
orientation regenerated the intact BALF2 open reading 
frame (pEF56A), The B95-8 DNA polymerase expression 
plasmid was a gift from Don Coen (Harvard University) and 
contains the DNA polymerase open reading frame BALF5 
under the control of the simian virus 40 early promoter. The 
Sall'F subclone (pGD4) contains the Sall-F fragment from 
B95-8 cloned into pBR322 at the Sail site. The BamlU-BG 
clone (pDH33) contains the BamHl-B and BamHl-G frag- 
ments from P3HR-1 cloned into pBR322 at the BamBl site. 
The plasmid pJMH4 (BamHIABAG) is an M-ABA subclone 
containing the 3' 6,107 bp oiBamHl-B and the 5' 3,003 bp of 
BamHl'G cloned into the pHC79 vector. 

RESULTS 

Establishment of an ori-Lyt replication assay. To identify 
viral proteins required to replicate ori-Lyt, we established a 
transient cotransfection assay similar to that described by 
Challberg (8) for HSV type 1 (HSV-1). In that assay, large 
restriction fragments of the HSV-1 genome were cotrans- 
fected into Vero cells with a plasmid containing the HSV-1 
origin of replication, ori-S, and Dpnl sensitivity was used to 
discriminate between input and replicated DNA (55). The 
Dpnl assay is based on the differential ability of Dpnl to 
cleave the input bacterially synthesized DNA which is 
methylated on the A residue of the Dpnl cleavage site CATC 
and the lack of cleavage when this methylation is lost after 
replication in eukaryotic cells. In the experiments described 
here, Vero cells (an African green monkey kidney cell line) 
were also used, and the exogenous EBV genome was 
provided by a set of six overlapping cosmids from the EBV 
strain M-ABA (56) (Fig. 1). The replication origin was 



provided by a plasmid carrying the EBV BamHl-H fragment 
which encompasses the DS-L ori-Lyt. The transfection 
mixture also contained expression plasmids for the lytic- 
cycle transactivators, Zta, Rta, and Mta, to ensure adequate 
expression of the EBV early genes from the transfected 
cosmid clones. To determine the optimal assay conditions, 
we transfected Vero ceils with the complete set of cosmids, 
the5amHI-H target, and increasing amounts (0 to 2.5 jtg) of 
the expression plasmids encoding the transactivators. The 
isolated cellular DNA was cleaved with restriction enzymes, 
and the DNA fragments were separated by gei electropho- 
resis and transferred onto a nylon membrane. The input, 
transfected DNA was visualized by hybridization with a 
radiolabeled pBR322 probe. The BomHI-cleaved DNA is 
shown in Fig. 2A, and Ba/nHI-plus-Dpnl-cIeaved DNA is 
shown in Fig. 2B. At a concentration of 2.5 jtg of each 
expression plasmid, maximal replication of the target oc- 
curred (lane 4), as indicated by the presence of the Dpnl- 
resistant ori-Lyt band (arrowed). If no transactivators were 
included, no detectable replication occurred (lane 1). Below 
2.5 Jig of the transactivators, either no detectable replication 
occurred (lane 2) or minimal replication of the target oc- 
curred (lane 3). Replication was origin specific. Of all the 
input DNAs, only the BamHl-H target (and to a lesser extent 
the cosmids that contain DS-R and DS-L) showed Dpnl 
resistance. 

Replication genes are located on three separate cosmids. 
Having established the assay conditions, we next deter- 
mined whether any of the cosmids were dispensable. The 
assay was repeated with one of the cosmids being omitted 
from each transfection mix (Fig. 3B). A D/7«I-resistant band 
was observed in the absence of cMSalB, cM302-23, or 
cMSalA, indicating that these three cosmids were not re- 



Vol. 66, 1992 



EBV REPLICATION GENES 5033 



A. C 




1 
♦ 



12 3 4 



12 3 4 



UNSEPD 

INPUT 

ONA 



FIG. 2. Six overlapping cosmids support replication of the 
BflmHI-H target in the presence of the lytic-cycle transactivators. 
(A) Southern blot of transfected cell DNA cut with BamHl and 
probed with radiolabeled pBR322 DNA. The top four bands of input 
DNA represent cosmids, and the central three bands represent the 
Zta, Rta, and Mta transactivators. Lane 1, no transactivators added; 
lanes 2 to 4, increasing amounts (0.1. 0.5, and 2.5 jig, respectively) 
of the transactivator expression plasmids. (B) As in panel A, except 
that the DNA was digested with BamHl plus DpnL Replicated 
D/jnI-resistant target is readily detected with 2.5 p-g of the Zta, Rta, 
and Mta expression constructs (lane 4). 



quired for replication. On the other hand, if cM302-21, 
CM301-99, or cMB14 was removed, replication was not 
detected. Thus, in the presence of the lytic-cycle transacti- 
vators, only three of the cosmids, cMBU, cM301-99, and 
CM302-21, were required for replication of ori-Lyt. 

To establish that the three essential cosmids, cMB14, 
CM301-99, and cM302-21, were sufficient for replication of 
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FIG. 3. Three cosmids, cMB14, cM301-99, and cM302-21, are 
each required for replication of the target. (A) Set of overlapping 
cosmids. (B) Southern blot of transfected cell DNA digested with 
BamHl plusDp/iI and probed with pBR322 to detect Z)p/il-resistant, 
replicated DNA. The positive control contained all six cosmids 
transfected with 2.5 jtg of the expression constructs for Zta, Rta, 
and Mta. The target was resistant to cleavage with DpnL Replica- 
tion of the target was negative when the cosmid cM302-21, cM301- 
99, or cMB14 was omitted. Replication of the target was positive in 
the absence of the cosmid cMSalB, cM302-23, or cMSalA. In the 
negative control, all cosmids were omitted. 
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FIG. 4. Three essential cosmids. cMB14. cM301-99, and cM302- 
21, are sufficient for replication of the target, but only in the 
presence of the Zta and Rta expression plasmids. (A) Set of 
overlapping cosmids with the three essential cosmids highlighted. 
(B) Transfected cell DNA cut with BamHl and Dpnl, Southern 
blotted, and probed with BamHl-H to detect Dpnl-resistant, repli- 
cated DNA. The requirement for Zta and Rta was examined in cells 
cotransfected with cMB14, cM301-99, cM302-21, and Mta. 



ori-Lyt, we cotransfected them with expression plasmids for 
Zta, Rta, and Mta. Efficient replication of the target occurred 
in the presence of these three cosmids (Fig. 4), demonstrat- 
ing that cMB14, cM301-99, and cM302-21 not only were 
required for replication, but were sufficient. Addition of both 
the Zta and Rta transactivators remained a requirement for 
replication. If either was left out of the transfection mix, 
detectable replication of SamHI-H did not occur. Thus, to 
obtain replication of ori-Lyt, three of the cosmids were 
required as well as two transcriptional activators, Zta and 
Rta. Zta and Rta could be functioning directly to transacti- 
vate the ori-Lyt promoter and enhancer or indirectly to 
increase expression from the cosmid-encoded genes needed 
for replication. The requirement for Mta could not be 
assessed in these experiments because Mta is encoded 
within one of the essentia! cosmids, cMB14. 

ori-Lyt replication requires the EBV genes BALF5, BALF2, 
BMRFl, and BSLFL The seven genes of HSV-1 whose 
products are essential and sufficient to replicate an HSV-1 
origin-containing target plasmid are the DNA polymerase, 
UL30; the single-stranded DNA-binding protein, UL29; the 
tripartite helicase-primase complex containing UL5, UL8, 



TABLE 1. Putative EBV homologs of the essential HSV 
replication proteins 



Function 

DNA polymerase (POL) 
Polymerase processivity factor (PPF) 
Single-stranded DNA-binding protein 
(SSB) 

Helicase OTCL) comp\cx 
Primase (PRI) J ^ 
Primase-associated factor (PAF) 
Origin-binding protein (OBP) 



HSV 


EBV 


Identity 


gene 


ORP' 


(%) 


UL30 


BALF5 


33 


UL42 


BMRFl 


b 


UU9 


BALF2 


25 


UL5 


BBLF4 


34 


UL52 


BSLFl 


23 


IJL8 


BBLF2/3 




UL9 


? 




" ORF, open reading frame. 
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FIG. 5. Cosmids cMB14 and cM302-21 each encode only two essential replication genes. (A) Set of cosmids; cMB14, cMSalC, and 
CM302-21 are highlighted. The locations of the genes on cMB14, cMSalC, and cM302-21 that have homology with HSV-1 replication genes 
are also shown. In panels B and C, transfected cell DNA was cut with BamHl and Dpnl, Southern blotted, and probed with BamHl-H to 
detect i>pnl-resistant, replicated DNA. (B) Lane 1, the three cosmids, cMB14, cM301-99, and cM302-21, plus the transactivator expression 
plasmids. Lane 2, no Zta added. Lane 3, the cosmid cMB14 was replaced by the cosmid cMSalC. Lane 4, the cosmid cMB14 was replaced 
by expression plasmids encoding BMRFl, the polymerase processivity factor, and BSLFl, the primase homolog. (C) Lane 1, the three 
cosmids, cMB14, cM301-99, and cM302-21, plus the transactivator expression plasmids. Lane 2, no Zta added. Lane 3, the-cosmid cM302-2I 
was replaced by expression plasmids encoding BALF5, the DNA polymerase, and BALF2, the single-stranded DNA-bmdmg protein 
homolog. Lane 4, minus BALF5. Lane 5, minus BALF2. 



and UL52; the DNA polymerase processivity factor, UL42; 
and the origin-binding protein, UL9. Table 1 shows the 
seven HSV-1 replication genes, their potential EBV ho- 
mologs, and the percent homology which they share (52). 

Upon inspection of the map location of those EBV genes 
that had potential homology with the HSV-1 replication 
genes, it became apparent that they were all located within 
the three cosmids that were essential for replication in our 
assay. To determine whether these genes indeed encoded 
functional replication proteins, the EBV DNA polymerase, 
BALF5, and its processivity factor, BMRFl, as well as 
BSLFl, BBLF4, and BALF2 were placed under the control 
of the strong simian virus 40 early or CMV major immediate- 
early promoters in eukaiyotic expression vectors and tested 
in substitution experiments. First, the cosmid cMB14 was 
replaced with another cosmid, cMSalC, which has five open 
reading frames in common with cMB14. This cosmid was 
able to substitute functionally for cMB14 (Fig. 5B, lane 3), 
indicating that the replication genes provided by c^4B14 
were located within the region that overlaps with cMSalC. 
The common sequences contain BMRFl and BSLFl plus 
BMLFl (Mta), BMRF2 (a late gene), and B0RF2 and BaRFl, 
which encode the large and small subunits of ribonucleotide 
reductase (3, 24). The cosmid cMB14 was next replaced by 
expression plasmids for the two genes, BSLFl and BMRFl, 
that were potential homologs for HSV replication genes. 
Replication was positive in the presence of BSLFl and 
BMRFl, although the replication signal was reduced com- 
pared with that obtained with cMB14 or cMSalC (Fig. 
5B, lane 4). Transfection of the three essential cosmids in 
the presence of the lytic-cycle transactivators served as the 
positive control (Fig. 5B, lane 1). Removal of Zta from the 
transfection mix abolished detectable replication of the 
target and served as the negative control (Fig. 5B, lane 2). In 
a parallel experiment, the cosmid cM302-21 was successfully 
replaced by expression plasmids for the two predicted rep- 
lication genes that mapped within cM302-21, the DNA 
polymerase, BALF5, and BALF2, the putative single- 
stranded DNA-binding protein (Fig. 5C, lane 3). If either 



BALF5 (Fig. 5C, lane 4) or BALF2 (Fig. 5C, lane 5) was 
omitted, replication of the target did not occur. Therefore, 
the cosmid cMB14 encodes only two essential replication 
proteins, BSLFl, the primase homolog, an.d BMRFl, the 
processivity factor. Similarly, the cosmid cM302-21 also 
encodes only two replication proteins, the DNA polymerase, 
BALF5, and BALF2, the single-stranded DNA-binding pro- 
tein homolog. 

ori-Lyt replication also requires BBLF4, BBLF2/3, and 
Sa/I-F. A diagram of the remaining essential cosmid, cM301- 
99, showing the map locations of relevant open reading 
frames is presented in Fig. 6A. The only gene encoded by 
cM301-99 which has recognized sequence homology to a 
replication protein of HSV-1 is BBLF4. However, when this 
cosmid was replaced by an expression plasmid for BBLF4, 
no detectable replication occurred (data not shown). Repli- 
cation of the target was restored, however, if cM301-99 was 
replaced by the BBLF4 expression plasmid and two 
CM301-99 subclones, Sall-F and BamHl-BG, neither of 
which contains an intact BBLF4 open reading frame (Fig. 
6B, lane 3). Removal of either BamUl-BG (Fig. 6B, lane 4) 
or Sall-F (Fig. 6B, lane 5) abolished detectable replication of 
the target. The three cosmids plus transactivators again 
served as the positive control (Fig. 6B, lane 1), while the 
negative control lacked Zta (Fig. 6B, lane 2). Removal of 
BBLF4 also abohshed detectable replication of the target 
(data not shown). Because two cM301-99 subclones and the 
BBLF4 expression plasmid were required to replace cM301- 
99, at least three replication genes appear to be provided by 
this cosmid. 

To determine the gene or genes provided by jBamHI-BG, 
we used subclones of this DNA fragment. The clone 
fiamHIABAG(pJMH4) was able to substitute for jBamHI-BG 
in the replication assay (Fig. 6C, lane 4). Both BamHI-BG 
and BamHIABAG contain the two open reading frames 
BBLF2 and BBLF3 (Fig. 6A), which are potential UL9 and 
UL8 homologs based on their genome locations (52). The 
linkage with UL8 was further strengthened by the recogni- 
tion that BBLF2 has a stretch of 55 amino acids which are 
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FIG 7 BBLF2/3 shares genomic location and limited amino acid sequence similarity with UL8 of HSV, gene 51 of VZV, and of 
HCMV (A) Relationship of the BBLF2 and BBLF3 open reading frames to the open reading frames in the equivalent regions of the HJ>V 
VZV and HCMV genomes (11, 19, 53). Potential ATP-binding motifs are indicated (•), as is a region of amino acid sunilanty ([x]) (B) 
Sari^n oi Te s%uences denot;d in panel A by the symbol 0. Identical residues are indicated by asterisks {*), and equivalent residues 
are indicated by vertical lines ( | ). 



viral lytic-cycle replication was selectively inhibited by 
drugs such as phosphonoacetic acid and acyclovir (16, 45). 
Sequencing of the EBV genome revealed an open reading 
frame (BALF5) whose protein product shares 33% sequence 
identity with the HSV DNA polymerase (3, 52) and like the 
HSV polymerase has several regions that are highly con- 
served in prokaryotic and eukaryotic viral DNA polymer- 
ases and in mammalian DNA polymerase alpha (64). More 
recently, BALF5 has been synthesized in vitro and shown to 
exhibit polymerase activity (50). The BMRFl gene is the 
positional equivalent of the HSV UL42 polymerase proces- 
sivity factor (20, 27, 29, 36). Although the EBV protein has 
no significant sequence similarity to UL42, it is a functional 
homolog. The BMRFl protein copurifies with the EBV 
polymerase and has been shown to stimulate polymerase 
activity in vitro (40, 46). 

The four other EBV genes identified as being required for 
ori-Lyt replication in the cotransfection-replication assay are 
BALF2, BBLF4, BSLFl, and BBLF2/3. The BALF2 gene 
product has 25% overall identity with the HSV single- 
stranded DNA-binding protein (UL29). BALF2 also con- 
tains a series of motifs that are highly conserved in the HSV, 
VZV, SCMV, and HCMV single-stranded DNA-binding 
proteins (1, 10, 65). The HSV and EBV proteins are virtually 
identical in size at 1,196 and 1,128 amino acids, respectively. 
The BBLF4 and BSLFl proteins have significant sequence 
identity with the HSV UL5 and UL52 genes. UL5 and UL52 
form a tripartite complex with UL8 in HSV-infected cells 
and in insect cells coinfected with recombinant baculovi- 
ruses expressing these genes (15, 21). This complex has both 
helicase and primase activities. Recently, helicase and pri- 
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FIG. 8. Six cloned replication genes plus 5fl/I-F and the lytic- 
cycle transacdvators support replication of the target; the BHLFl 
gene product is not required for replication. The ori-Lyt-containing 
target was transfected with expression constructions for the six 
cloned genes: BMRFl, the polymerase processivity factor; BSLFl, 
the primase homolog; BBLF4, the helicase homolog; BBLF2/3, the 
potential UL8 homolog; BALF5, the DNA polymerase; and 
BALF2, the single-stranded DNA-binding protein homolog. 5fl/I-F 
was also added, as well as expression constructions for the tliree 
transactivators. A Southern blot of transfected cell DNA digested 
with BamHl and Dpnl was probed with BamHl-H to detect Dpnh 
resistant, replicated DNA. Lane 1, an ori-Lyt target (ANolI) carry- 
ing a deletion within the BHLFl open reading frame. Lane 2, no Zta 
added. Lane 3, the standard BamHl-H target. Lane 4, the standard 
BflmHI-H target in the presence of Zta, Rta, and Mta. Lane 5, no 
Mta added. 
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mase activities were demonstrated in a bipartite complex of 
UL5 and UL52 (6, 22), Analysis of the UL5 amino acid 
sequence has revealed six motifs, including an ATP-bindmg 
motif, that are present in other families of helicases (28, 37, 
44), and hence UL5 and BBLF4 are predicted to function as 
helicases. Indeed, individual mutations in each of the six 
conserved motifs of UL5 abolish its ability to complement a 
replication-deficient null mutant in a transient replication 
assay (76). UL52 is the putative primase of HSV, although 
an association with UL8 and UL5 may be required for 
complete activity. BSLFl is the candidate primase of EBV. 
Furthermore, we believe that BBLF2/3 is the homolog of 
HSV UL8, the third member of the helicase-primase com- 
plex. BBLF3 and BBLF2 are the positional equivalents of 
HSV UL8 and UL9. However, in EBV, these two open 
reading frames are believed to be spliced into a single 
transcript and hence would presumably encode only one 
protein. Previous reports noted no significant similarity 
between BBLF2/3 and UL8 or UL9. However, a visual 
alignment (Fig. 7) reveals a 55-amino-acid region of BBLF2 
that is conserved in HSV UL8, VZV 52, and HCMV UL102. 
For this reason, we believe that BBLF2/3 is likely to be the 
homolog of HSV UL8. The spliced BBLF2/3 transcript 
would encode 709 amino acids compared with 750 for HSV 
UL8. 

At least one other gene encoded by EBV Sall-F is 
required for ori-Lyl replication in the cotransfection assay. 
Interestingly, Sall-F encodes the latency origin-binding pro- 
tein EBNA-1. However, EBNA-1 is thought not to be the 
required gene product because Safl-F cannot be replaced by 
other subclones that contain the EBNA-1 open reading 
frame (unpublished data). One of the replication proteins 
that has not been identified is an ori-Lyt origin-binding 
protein equivalent to the HSV origin-binding protein, UL9 
(23, 41, 54). Since ori-Lyt is unrelated in sequence to HSV 
ori-S and ori-L, it is not surprising that comparative analyses 
have not revealed a homolog for UL9. It is possible that the 
gene for the EBV origin-binding protein is located in Sall-F 
or that one of the proteins already shown to be required in 
the cotransfection assay is multifunctional and also provides 
origin-binding activity. Another alternative is that the origin- 
binding protein is not viraily encoded and that a cellular 
factor serves in this capacity. 

Hie EBV-encoded proteins identified in this study as being 
required for ori-Lyt replication are homologs of HSV proteins 
that participate directly in DNA replication. Herpesviruses 
also encode a number of enzymes, alkaline nuclease, ribo- 
nucleotide reductase, thymidine kinase, dUTPase, and 
uracil DNA glycosylase, that are involved in nucleotide 
metabolism and play an ancillary role in DNA synthesis 
(reviewed in reference 66). Genetic studies with HSV indi- 
cate that under certain conditions such as growth at high 
temperature, in growth-arrested cells, or in the animal host, 
HSV DNA replication may become dependent on the viraily 
encoded alkaline nuclease, thymidine kinase, and ribonucle- 
otide reductase. In ourcotransfection-replication assays, the 
cosmid cMB14 could be replaced by a cosmid cMSalC which 
has five open reading frames in common with cMB14. The 
common genes are BMRFl (the polymerase processivity 
factor), BSLFl (the putative primase homolog), BMLFl (the 
Mta transactivator), BMRF2, B0RF2, and BaRFl. When 
these cosmids were replaced by expression vectors for 
BMRFl and BSLFl, replication of ori-Lyt did occur but at a 
reduced level. This observation raises the possibility that 
cMB14 and cMSalC were providing an additional nonessen- 
tial, replication-related function. Mta can be discounted 



since it was being provided exogenously by an expression 
vector, and BMRF2 is a late gene and therefore not a likely 
candidate for a replication-related function. The remaining 
common open reading frames, B0RF2 and BaRFl, encode 
the large and small subunits of ribonucleotide reductase. 
One interpretation of the data is that the EBV-encoded 
ribonucleotide reductase is providing an auxiliary function 
that increases replication efficiency in these assays. The 
replication signal was also reduced when the BamHI-H 
ori-Lyt target was replaced with a modified target carrying a 
deletion in the BHLFl gene. In this case, the apparent 
reduction in replication efficiency may simply represent 
decreased hybridization of the probe to the target DNA 
which no longer contains the Notl repeats. 

The minimal ori-Lyt as defined by Hammerschmidt and 
Sugden (32) consists of three essential subdomains (Fig. 1). 
Two of these, the BHLFl promoter and the upstream 
enhancer, also function in transcriptional regulation of the 
flanking BHLFl and BHRFl genes. Characterized replica- 
tion origins are commonly associated with transcriptional 
elements (reviewed in reference 19), These elements may be 
integrally linked to replication functions or may serve an 
auxiliary role by contributing to replication efficiency. For 
example, the cloned minimal HSV-1 origin, ori-S, can func- 
tion in transient replication assays. However, if the flanking 
sequences containing the divergent promoters for the imme- 
diate-early IE175 and IE68 genes are also included, replica- 
tion of the target is substantially increased (69). The EBV 
latency origin of replication (ori-P) consists of two domains, 
an EBNA-l-dependent enhancer and a region of dyad sym- 
metry that is the site of initiation of replication (26, 72). With 
ori-P, replication is strictly dependent on the presence of the 
enhancer domain. The recently identified CMV origin of 
replication contains multiple binding sites for transcription 
factors (2, 33), although their requirement for replication has 
yet to be addressed by mutational analyses. 

The BHLFl promoter that constitutes one of the essential 
domains of EBV ori-Lyt contains four binding sites for Zta 
and is efficiently activated by Zta in cotransfection assays. 
The ori-Lyt enhancer contains two binding sites for a second 
viral transcriptional activator, Rta, and one binding site for 
Zta (31, 48). The enhancer is strongly activated by Rta and 
responds synergistically to the combination of Rta and Zta 
(14). In the cotransfection-replication assays, ori-Lyt repli- 
cation was dependent on the presence of Zta, Rta, and Mta. 
In the assays using the cosmid clones, the transactivators 
would have been required for efficient expression of the 
replication genes encoded within the cosmids. Even in the 
final assays described here, at least one of the genes neces- 
sary for replication was provided on Sall-F, and again it is 
likely that the transactivators would be needed for expres- 
sion from this plasmid. In HSV, the immediate-early trans- 
activators were required for replication of ori-S in the 
transient replication assay when viral DNA fragments were 
used to provide the replication functions. In contrast, when 
each of the seven replication genes was expressed from the 
strong constitutive HCMV promoter, the requirement for 
these transcription factors in the transient replication assay 
was alleviated (35). In EBV, however, the inclusion of the 
promoter and enhancer elements within the defined limits of 
the minimal ori-Lyt makes it highly probable that Zta and 
Rta contribute to replication directly through transcriptional 
activation, through DNA binding, or through interactions 
with components of the replication complex. The role oi 
these transactivators in ori-Lyt replication can be addressed 
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more directly when the full complement of EBV replication 
genes has been identified. 
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